Logic

Background: Dust entering the Sierras comes from both regional and global sources – and the contribution of each source changes over time. There is evidence that microbial communities can differ based on dust source and timing of dust deflation, and that this can have subsequent effects on endophytic associations (Caroline Frank’s work) and soil microbial communities at the deposition location. We therefore wanted to know:

Q1. Do dust-associated microbial communities entering the Sierra Nevada differ across space and time?

Outputs:

1. Determine the number of new OTUs identified at each site per sampling event

2. NMDS of unweighted unifrac or jaccard dissimilarity. PERMANOVA with PERMDISP.

Limitation: our time comparisons may be skewed because we did not sterilize collectors in between sampling events. Time points may appear more similar to each other than in reality. It’s also important to run on presence-absence data only; we don’t want to capture community differences that are driven by post-depositional divergence in relative abundances.

# wd is set to /RdataDust/DustMS_Rmarkdown which contains this file and all other linked files (or should)
getwd()
## [1] "/Users/colleennell/Dropbox/rstats/consulting/Mia/RdataDust/DustMS_Rmarkdown"
# i made a folder in this directory named data where I left it
list.files('data')
##  [1] "DustFungi.guilds.txt"             "DustFungi.otu_table.taxonomy.txt"
##  [3] "DustFungi.otu_table.txt"          "filtered_table_w_metadata.csv"   
##  [5] "filtered_table_w_metadata.txt"    "filtered_table_w_metadata.xlsx"  
##  [7] "SierraMap6.csv"                   "SierraMap6.xlsx"                 
##  [9] "unweighted_unifrac_dm.txt"        "weighted_unifrac_dm.txt"
map6<-read.csv("data/SierraMap6.csv", header = TRUE, col.names =   c("SampleID","BarcodeSequence","LinkerPrimerSequence","Year","Month","SiteCode","RepNum","SiteRep","Site","Elevation","DateCode","DescName","Description"))

# print df
map6
# I don't have the same files referenced in DustAnnotated1.1
B16S<-read.csv("data/filtered_table_w_metadata.csv", header = TRUE, stringsAsFactors = FALSE)
B16S

Split taxonomy

# split taxa groupings into new columns
source('dust_functions.R')
splitdf<-split_taxa(B16S)
head(splitdf)

Read in unifrac distances

list.files('data')
##  [1] "DustFungi.guilds.txt"             "DustFungi.otu_table.taxonomy.txt"
##  [3] "DustFungi.otu_table.txt"          "filtered_table_w_metadata.csv"   
##  [5] "filtered_table_w_metadata.txt"    "filtered_table_w_metadata.xlsx"  
##  [7] "SierraMap6.csv"                   "SierraMap6.xlsx"                 
##  [9] "unweighted_unifrac_dm.txt"        "weighted_unifrac_dm.txt"
unifrac<-read.table('data/unweighted_unifrac_dm.txt')
unifrac_wt<-read.table('data/weighted_unifrac_dm.txt')
#rows and col reflect pairwise distances?
str(unifrac)
## 'data.frame':    31 obs. of  31 variables:
##  $ OS115: num  0 0.802 0.811 0.733 0.842 ...
##  $ AP114: num  0.802 0 0.7 0.787 0.687 ...
##  $ JJ415: num  0.811 0.7 0 0.748 0.566 ...
##  $ OT115: num  0.733 0.787 0.748 0 0.788 ...
##  $ JJ515: num  0.842 0.687 0.566 0.788 0 ...
##  $ OS615: num  0.746 0.859 0.844 0.777 0.878 ...
##  $ OJ615: num  0.779 0.711 0.627 0.714 0.659 ...
##  $ JT415: num  0.769 0.669 0.618 0.73 0.646 ...
##  $ JP315: num  0.698 0.768 0.784 0.734 0.827 ...
##  $ OJ514: num  0.789 0.646 0.598 0.735 0.619 ...
##  $ OJ515: num  0.793 0.757 0.65 0.693 0.691 ...
##  $ OJ415: num  0.764 0.753 0.657 0.69 0.707 ...
##  $ AS114: num  0.738 0.731 0.743 0.748 0.773 ...
##  $ JS415: num  0.753 0.876 0.858 0.811 0.901 ...
##  $ JP414: num  0.746 0.681 0.733 0.759 0.768 ...
##  $ OT515: num  0.741 0.803 0.782 0.621 0.817 ...
##  $ JJ114: num  0.83 0.671 0.597 0.789 0.565 ...
##  $ OT415: num  0.74 0.808 0.774 0.613 0.819 ...
##  $ OT414: num  0.736 0.683 0.69 0.722 0.725 ...
##  $ JP515: num  0.699 0.766 0.79 0.732 0.836 ...
##  $ JT115: num  0.77 0.652 0.624 0.743 0.635 ...
##  $ AJ414: num  0.705 0.813 0.8 0.745 0.841 ...
##  $ OP315: num  0.697 0.811 0.806 0.74 0.853 ...
##  $ OS214: num  0.74 0.744 0.765 0.763 0.791 ...
##  $ JT314: num  0.772 0.668 0.685 0.755 0.684 ...
##  $ OP515: num  0.702 0.798 0.797 0.733 0.839 ...
##  $ AT414: num  0.791 0.689 0.683 0.738 0.706 ...
##  $ OS415: num  0.582 0.786 0.801 0.734 0.831 ...
##  $ OP615: num  0.682 0.79 0.812 0.74 0.84 ...
##  $ OP614: num  0.743 0.81 0.81 0.767 0.85 ...
##  $ JS115: num  0.667 0.821 0.818 0.774 0.864 ...
## plot otu unifrac dissimilarities as heatmap
# first melt df so there is a new row for every pairwise combo
unifrac.melt<-unifrac%>%melt(variable.name='otu_1')%>%mutate(otu_2 = rep.int(colnames(unifrac), times=length(colnames(unifrac))))
# dark color indicates samples that were more similar to one another based on unifrac dissimilarity, blue = contains no similarities  
##reorder axis so grouped by more similar samples
ggplot(data = unifrac.melt, aes(x = reorder(otu_1, value), y = reorder(otu_2, value)))+ 
  geom_tile(aes(fill = value))+
  scale_fill_gradient2(low = 'midnightblue', mid='deepskyblue3', high='yellow', midpoint = .5)+
  theme(axis.text.x = element_text(angle=90))+
  labs(x='Sample', y='Sample')

# make the same plot for weighted unifrac, compare

Q2. If differences exist, are there particular taxa that contribute disproportionately to those differences?

Note: Can’t run any analyses that rely on relative abundance data (e.g., simper), Address using absence-presence data

Outputs:

1. Indicator species analysis to identify taxon-habitat association patterns.

This procedure identifies OTUs as indicator species independently from their abundance in the total data set. ####Steps (from https://www.nature.com/articles/ismej2015238):

  1. single- and doubleton OTUs are removed as they hold little indicator informationusing the multipatt function (number of permutations=9999) implemented in the indicspecies R package (De Cáceres et al., 2010).
  2. To account for multiple testing, P-values were corrected by calculating false discovery rates (q-values) with the q-value function implemented in the BiocLite R package (Dabney and Storney, 2014). Indicator OTUs with q<0.05 were considered significant.
  3. Indicator taxa were represented in bipartite networks by using the edge-weighted spring embedded algorithm layout implemented in Cytoscape v.3.0.2 (Shannon et al., 2003) where point biserial correlation values, a measure of species–habitat association, were used to weight the edges between nodes constituting the habitats and indicator OTUs (Hartmann et al., 2015).
  4. We further mapped these indicator OTUs on taxonomic networks generated in Cytoscape v.3.0.2 to investigate potential taxa–habitat associations (Hartmann et al., 2015). I
  5. Indicator OTUs classified at the genus level were displayed in a taxonomic tree generated in iTOL (Letunic and Bork, 2011) together with the positive point biserial correlation values associated with each habitat.
  6. To find patterns of co-occurrence among OTUs characteristic of the habitats studied, we analysed correlations among the relative abundances of all indicator OTUs (co-correlations) by calculating Spearman rank correlation values (Spearman, 1904) in R with the function corr.test implemented in the package psych. Multiple testing was accounted for by correcting P-values with a false discovery rates of 5% (q-value<0.05).
  7. Bacterial and fungal indicator OTUs that were significantly co-correlated were displayed in networks with the edge-weighted spring embedded algorithm layout implemented in Cytoscape where Spearman correlation values were used to weight the edges between the nodes representing the OTUs (Hartmann et al., 2015).”

it teruherfgbwegrgterfghgfhgf

Q3. Can the observed variation in XXX be explained by provenance? By nutrient composition?

Outputs:

1. Multiple regression on distance matrix